Reinforcement learning and evolutionary algorithms for non-stationary multi-armed bandit problems

نویسندگان

Dimitris E. Koulouriotis

A. S. Xanthopoulos

چکیده

Multi-armed bandit tasks have been extensively used to model the problem of balancing exploitation and exploration. A most challenging variant of the MABP is the non-stationary bandit problem where the agent is faced with the increased complexity of detecting changes in its environment. In this paper we examine a non-stationary, discrete-time, finite horizon bandit problem with a finite number of arms and Gaussian rewards. A family of important ad hoc methods exists that are suitable for non-stationary bandit tasks. These learning algorithms that offer intuition-based solutions to the exploitation– exploration trade-off have the advantage of not relying on strong theoretical assumptions while in the same time can be fine-tuned in order to produce near-optimal results. An entirely different approach to the non-stationary multi-armed bandit problem presents itself in the face of evolutionary algorithms. We present an evolutionary algorithm that was implemented to solve the non-stationary bandit problem along with ad hoc solution algorithms, namely action-value methods with e-greedy and softmax action selection rules, the probability matching method and finally the adaptive pursuit method. A number of simulation-based experiments was conducted and based on the numerical results that we obtained we discuss the methods’ performances. 2007 Elsevier Inc. All rights reserved.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Upper Confidence Trees and Billiards for Optimal Active Learning

This paper focuses on Active Learning (AL) with bounded computational resources. AL is formalized as a finite horizon Reinforcement Learning problem, and tackled as a single-player game. An approximate optimal AL strategy based on tree-structured multi-armed bandit algorithms and billiard-based sampling is presented together with a proof of principle of the approach. Motsclés : Apprentissage ac...

متن کامل

Markov Security Games: Learning in Spatial Security Problems

In this paper we present a preliminary investigation of modelling spatial aspects of security games within the context of Markov games. Reinforcement learning is a powerful tool for adaptation in unknown environments, however the basic singleagent RL algorithms are unfit to be applied in adversarial scenarios. Therefore, we profit from Adversarial Multi-Armed Bandit (AMAB) methods which are des...

متن کامل

Learning Optimal Parameter Values in Dynamic Environment: An Experiment with Softmax Reinforcement Learning Algorithm

1. Introduction Many learning and heuristic search algorithms require tuning of parameters to achieve optimum performance. In stationary and deterministic problem domains this is usually achieved through off-line sensitivity analysis. However, this method breaks down in non-stationary and non-deterministic environments, where the optimal set of values for the parameters keep changing over time....

متن کامل

Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems

We incorporate statistical confidence intervals in both the multi-armed bandit and the reinforcement learning problems. In the bandit problem we show that given n arms, it suffices to pull the arms a total of O ( (n/2) log(1/δ) ) times to find an 2-optimal arm with probability of at least 1−δ. This bound matches the lower bound of Mannor and Tsitsiklis (2004) up to constants. We also devise act...

متن کامل

Robot Beerpong: Model-Based Learning for Shifting Targets

De ning controls for robot to achieve precise goal-directed movements can be hard when using hand crafted solutions. Reinforcement Learning, particularly policy-search methods provides a promising alternative which has already been successfully used for robot learning. Here the task is learned using a function that rewards desired movements and an algorithm that seeks to maximize the reward. In...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Applied Mathematics and Computation

دوره 196 شماره

صفحات -

تاریخ انتشار 2008

Reinforcement learning and evolutionary algorithms for non-stationary multi-armed bandit problems

نویسندگان

چکیده

منابع مشابه

Upper Confidence Trees and Billiards for Optimal Active Learning

Markov Security Games: Learning in Spatial Security Problems

Learning Optimal Parameter Values in Dynamic Environment: An Experiment with Softmax Reinforcement Learning Algorithm

Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems

Robot Beerpong: Model-Based Learning for Shifting Targets

عنوان ژورنال:

اشتراک گذاری